2,261 research outputs found
Scalable methods for computing state similarity in deterministic Markov Decision Processes
We present new algorithms for computing and approximating bisimulation
metrics in Markov Decision Processes (MDPs). Bisimulation metrics are an
elegant formalism that capture behavioral equivalence between states and
provide strong theoretical guarantees on differences in optimal behaviour.
Unfortunately, their computation is expensive and requires a tabular
representation of the states, which has thus far rendered them impractical for
large problems. In this paper we present a new version of the metric that is
tied to a behavior policy in an MDP, along with an analysis of its theoretical
properties. We then present two new algorithms for approximating bisimulation
metrics in large, deterministic MDPs. The first does so via sampling and is
guaranteed to converge to the true metric. The second is a differentiable loss
which allows us to learn an approximation even for continuous state MDPs, which
prior to this work had not been possible.Comment: To appear in Proceedings of the Thirty-Fourth AAAI Conference on
Artificial Intelligence (AAAI-20
A Comparative Analysis of Expected and Distributional Reinforcement Learning
Since their introduction a year ago, distributional approaches to
reinforcement learning (distributional RL) have produced strong results
relative to the standard approach which models expected values (expected RL).
However, aside from convergence guarantees, there have been few theoretical
results investigating the reasons behind the improvements distributional RL
provides. In this paper we begin the investigation into this fundamental
question by analyzing the differences in the tabular, linear approximation, and
non-linear approximation settings. We prove that in many realizations of the
tabular and linear approximation settings, distributional RL behaves exactly
the same as expected RL. In cases where the two methods behave differently,
distributional RL can in fact hurt performance when it does not induce
identical behaviour. We then continue with an empirical analysis comparing
distributional and expected RL methods in control settings with non-linear
approximators to tease apart where the improvements from distributional RL
methods are coming from.Comment: To appear in the Proceedings of the Thirty-Third AAAI Conference on
Artificial Intelligenc
Small batch deep reinforcement learning
In value-based deep reinforcement learning with replay memories, the batch
size parameter specifies how many transitions to sample for each gradient
update. Although critical to the learning process, this value is typically not
adjusted when proposing new algorithms. In this work we present a broad
empirical study that suggests {\em reducing} the batch size can result in a
number of significant performance gains; this is surprising, as the general
tendency when training neural networks is towards larger batch sizes for
improved performance. We complement our experimental findings with a set of
empirical analyses towards better understanding this phenomenon.Comment: Published at NeurIPS 202
Metrics and continuity in reinforcement learning
In most practical applications of reinforcement learning, it is untenable to
maintain direct estimates for individual states; in continuous-state systems,
it is impossible. Instead, researchers often leverage state similarity (whether
explicitly or implicitly) to build models that can generalize well from a
limited set of samples. The notion of state similarity used, and the
neighbourhoods and topologies they induce, is thus of crucial importance, as it
will directly affect the performance of the algorithms. Indeed, a number of
recent works introduce algorithms assuming the existence of "well-behaved"
neighbourhoods, but leave the full specification of such topologies for future
work. In this paper we introduce a unified formalism for defining these
topologies through the lens of metrics. We establish a hierarchy amongst these
metrics and demonstrate their theoretical implications on the Markov Decision
Process specifying the reinforcement learning problem. We complement our
theoretical results with empirical evaluations showcasing the differences
between the metrics considered.Comment: Accepted at AAAI 202
- …